Ways to scale a mod_perl site

Ways to scale a mod_perl site

am 16.09.2009 17:49:59 von Igor Chudov

--00032555a89af716690473b3d906
Content-Type: text/plain; charset=ISO-8859-1

My algebra.com server serves about 77k pageviews and a little over a million
objects requests per day (with half of it being served in just 4 hours). I
peak out at 35 requests per second currently.

I use mod_perl, mysql, and perlbal with everything running on one server.

The server has a solid state disk to hold mysql data.

I believe that it can handle 3x-5x more traffic all by itself. However, I am
thinking of ways to scale up a mod_perl installation.

1) Use a load balancer like perlbal (I am already doing that)
2) Separate a MySQL database server from webservers.
3) Being enabled by item 2, add more webservers and balancers
4) Create a separate database for cookie data (Apache::Session objects) ???
-- not sure if good idea --

(next level)

5) Use a separate database handle for readonly database requests (SELECT),
as opposed to INSERTS and UPDATEs. Use replication to access multiple slave
servers for read only data, and only access the master for INSERT and UPDATE
and DELETE.

Any thoughts?

--00032555a89af716690473b3d906
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

My server serves about 77k p=
ageviews and a little over a million objects requests per day (with half of=
it being served in just 4 hours). I peak out at 35 requests per second cur=
rently.


I use mod_perl, mysql, and perlbal with everything running on one serve=
r.

The server has a solid state disk to hold mysql data.

I =
believe that it can handle 3x-5x more traffic all by itself. However, I am =
thinking of ways to scale up a mod_perl installation.


1) Use a load balancer like perlbal (I am already doing that)
2) Sep=
arate a MySQL database server from webservers.
3) Being enabled by item=
2, add more webservers and balancers
4) Create a separate database for =
cookie data (Apache::Session objects) ??? -- not sure if good idea --


(next level)

5) Use a separate database handle for readonly data=
base requests (SELECT), as opposed to INSERTS and UPDATEs. Use replication =
to access multiple slave servers for read only data, and only access the ma=
ster for INSERT and UPDATE and DELETE.


Any thoughts?


--00032555a89af716690473b3d906--

Re: Ways to scale a mod_perl site

am 16.09.2009 18:05:12 von mpeters

On 09/16/2009 11:49 AM, Igor Chudov wrote:

> 1) Use a load balancer like perlbal (I am already doing that)

A load balancer is good but so are proxies. If you can separate your
application server from the server that servers static content then
you'll get a boost even if they are on the same machine.

> 2) Separate a MySQL database server from webservers.

This is probably the first and easiest thing you should do.

> 3) Being enabled by item 2, add more webservers and balancers
> 4) Create a separate database for cookie data (Apache::Session objects)
> ??? -- not sure if good idea --

I've never seen the need to do that. In fact, I would suggest you drop
sessions altogether if you can. If you need any per-session information
then put it in a cookie. If you need this information to be tamper-proof
then you can create a hash of the cookie's data that you store as part
of the cookie. If you can reduce the # of times that each request needs
to actually hit the database you'll have big wins.

> 5) Use a separate database handle for readonly database requests
> (SELECT), as opposed to INSERTS and UPDATEs. Use replication to access
> multiple slave servers for read only data, and only access the master
> for INSERT and UPDATE and DELETE.

Reducing DB usage is more important than this. Also, before you go down
that road you should look at adding a caching layer to your application
(memcached is a popular choice).

--
Michael Peters
Plus Three, LP

Re: Ways to scale a mod_perl site

am 16.09.2009 18:13:22 von Brad Van Sickle

>
>> 3) Being enabled by item 2, add more webservers and balancers
>> 4) Create a separate database for cookie data (Apache::Session objects)
>> ??? -- not sure if good idea --
>
> I've never seen the need to do that. In fact, I would suggest you drop
> sessions altogether if you can. If you need any per-session
> information then put it in a cookie. If you need this information to
> be tamper-proof then you can create a hash of the cookie's data that
> you store as part of the cookie. If you can reduce the # of times that
> each request needs to actually hit the database you'll have big wins.
>
>

Can I get you to explain this a little more? I don't see how this could
be used for truly secure sites because I don't quite understand how
storing a hash in a plain text cookie would be secure.

The thing I hate most about my "secure" applications is the fact that I
have to read the DB at the start of every request to ensure that the
session cookie is valid and to extract information about the user from
the session table using the session ID stored in the cookie. Hitting
the DB is the quickest way to kill performance and scalability in my
experience. I know a lot of true app servers (Websphere, etc..)
store this data in cached memory, but I was unaware that there might be
an option for doing this without using a DB with mod_perl .

Re: Ways to scale a mod_perl site

am 16.09.2009 18:20:31 von Igor Chudov

--0015175cac9c4035340473b447dc
Content-Type: text/plain; charset=ISO-8859-1

On Wed, Sep 16, 2009 at 11:05 AM, Michael Peters wrote:

> On 09/16/2009 11:49 AM, Igor Chudov wrote:
>
> 1) Use a load balancer like perlbal (I am already doing that)
>>
>
> A load balancer is good but so are proxies. If you can separate your
> application server from the server that servers static content then you'll
> get a boost even if they are on the same machine.
>

I have very little static content. Even images are generated. My site
generates images of math formulae such as (x-1)/(x+1) on the fly.,


>
> 2) Separate a MySQL database server from webservers.
>>
>
> This is probably the first and easiest thing you should do.
>
>
agreed

3) Being enabled by item 2, add more webservers and balancers
>> 4) Create a separate database for cookie data (Apache::Session objects)
>> ??? -- not sure if good idea --
>>
>
> I've never seen the need to do that. In fact, I would suggest you drop
> sessions altogether if you can. If you need any per-session information then
> put it in a cookie. If you need this information to be tamper-proof then you
> can create a hash of the cookie's data that you store as part of the cookie.
> If you can reduce the # of times that each request needs to actually hit the
> database you'll have big wins.
>

I use sessions to keep users logged on. So the cookie is just an ID, and the
sessions table stores data such as authenticated userid.

I will double check, however, whether I give people sessions even if they
are not logged in.

Or maybe I can give them a cookie that will say "I am not logged in, do not
bother looking up my session".

Hm


> 5) Use a separate database handle for readonly database requests
>> (SELECT), as opposed to INSERTS and UPDATEs. Use replication to access
>> multiple slave servers for read only data, and only access the master
>> for INSERT and UPDATE and DELETE.
>>
>
> Reducing DB usage is more important than this. Also, before you go down
> that road you should look at adding a caching layer to your application
> (memcached is a popular choice).
>
>
It is not going to be that helpful due to dynamic content. (which is my
site's advantage). But this may be useful for other applications.

i

--0015175cac9c4035340473b447dc
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable



On Wed, Sep 16, 2009 at 11:05 AM, Michae=
l Peters <mpe=
ters@plusthree.com
>
wrote:
e" style=3D"border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt =
0.8ex; padding-left: 1ex;">
On 09/16/2009 11:49 AM, Igor Chudov wrote:



204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
1) Use a load balancer like perlbal (I am already doing that)




A load balancer is good but so are proxies. If you can separate your applic=
ation server from the server that servers static content then you'll ge=
t a boost even if they are on the same machine.



I have very little static content. Even images are ge=
nerated. My site generates images of math formulae such as (x-1)/(x+1) on t=
he fly.,
=A0
left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left=
: 1ex;">



204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
2) Separate a MySQL database server from webservers.




This is probably the first and easiest thing you should do.
=3D"im">


=A0agreed

"gmail_quote" style=3D"border-left: 1px solid rgb(204, 204, 204); margin: 0=
pt 0pt 0pt 0.8ex; padding-left: 1ex;">

204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
3) Being enabled by item 2, add more webservers and balancers

4) Create a separate database for cookie data (Apache::Session objects)

??? -- not sure if good idea --




I've never seen the need to do that. In fact, I would suggest you drop =
sessions altogether if you can. If you need any per-session information the=
n put it in a cookie. If you need this information to be tamper-proof then =
you can create a hash of the cookie's data that you store as part of th=
e cookie. If you can reduce the # of times that each request needs to actua=
lly hit the database you'll have big wins.


I use sessions to keep users =
logged on. So the cookie is just an ID, and the sessions table stores data =
such as authenticated userid.
=A0
I will double check, however, wheth=
er I give people sessions even if they are not logged in.


Or maybe I can give them a cookie that will say "I am not logged i=
n, do not bother looking up my session".

Hm

kquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, 204, =
204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">



204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
5) Use a separate database handle for readonly database requests

(SELECT), as opposed to INSERTS and UPDATEs. Use replication to access

multiple slave servers for read only data, and only access the master

for INSERT and UPDATE and DELETE.




Reducing DB usage is more important than this. Also, before you go down tha=
t road you should look at adding a caching layer to your application (memca=
ched is a popular choice).




It is not going to be that helpful due to dyna=
mic content. (which is my site's advantage). But this may be useful for=
other applications.

i



--0015175cac9c4035340473b447dc--

Re: Ways to scale a mod_perl site

am 16.09.2009 18:22:35 von Igor Chudov

--0015175d0750a0bbef0473b44efe
Content-Type: text/plain; charset=ISO-8859-1

On Wed, Sep 16, 2009 at 11:15 AM, C. J. L. wrote

>
> I would buy a fast server with 4 or more cpu cores and the SSD or SAS
> drives and run the backend db on a dedicated mysql instance.
>


By the way, guys, the performance difference between a regular SATA drive
and a fast SAS drive is comparatively small.

The difference between a SAS drive and an SSD drive is tremendous.

i

--0015175d0750a0bbef0473b44efe
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable



On Wed, Sep 16, 2009 at 11:15 AM, C. J. =
L. wrote
id rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

I would buy a fast server with 4 or more cpu cores and the SSD or SAS drive=
s and run the backend db on a dedicated mysql instance.

ckquote>

=A0
By the way, guys, the performance differenc=
e between a regular SATA drive and a fast SAS drive is comparatively small.=



The difference between a SAS drive and an SSD drive is tremendous.
=

i


--0015175d0750a0bbef0473b44efe--

Re: Ways to scale a mod_perl site

am 16.09.2009 18:48:45 von Adam Prime

Igor Chudov wrote:
>
>
> On Wed, Sep 16, 2009 at 11:05 AM, Michael Peters > > wrote:
>
> On 09/16/2009 11:49 AM, Igor Chudov wrote:
>
> 1) Use a load balancer like perlbal (I am already doing that)
>
>
> A load balancer is good but so are proxies. If you can separate your
> application server from the server that servers static content then
> you'll get a boost even if they are on the same machine.
>
>
> I have very little static content. Even images are generated. My site
> generates images of math formulae such as (x-1)/(x+1) on the fly.,

I can understand generating them on the fly for flexibility reasons, but
I'd cache them, and serve them statically after that, rather than
regenerate the images on every single request. You can accomplish that
in the app itself, or just by throwing a caching proxy in front of it
(maybe you're already doing this with perlbal)

Adam

Re: Ways to scale a mod_perl site

am 16.09.2009 19:02:50 von Igor Chudov

--0015174bec1c7944840473b4de24
Content-Type: text/plain; charset=ISO-8859-1

On Wed, Sep 16, 2009 at 11:48 AM, Adam Prime wrote:

> Igor Chudov wrote
>>
>>
>> I have very little static content. Even images are generated. My site
>> generates images of math formulae such as (x-1)/(x+1) on the fly.,
>>
>
> I can understand generating them on the fly for flexibility reasons, but
> I'd cache them, and serve them statically after that, rather than regenerate
> the images on every single request. You can accomplish that in the app
> itself, or just by throwing a caching proxy in front of it (maybe you're
> already doing this with perlbal)
>
>
I actually do cache generated pictures, I store them in a database table
called 'bincache'. This way I do not have to compute and draw every image on
the fly. If I have a picture in bincache, I serve it, and if I do not, I
generate it and save it. That saves some CPU, but makes mysql work harder.

i

--0015174bec1c7944840473b4de24
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable



On Wed, Sep 16, 2009 at 11:48 AM, Adam P=
rime <adam.p=
rime@utoronto.ca
>
wrote:
style=3D"border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.=
8ex; padding-left: 1ex;">
Igor Chudov wrote
=3D"border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; p=
adding-left: 1ex;">

I have very little static content. Even images are generated. My site gener=
ates images of math formulae such as (x-1)/(x+1) on the fly.,




I can understand generating them on the fly for flexibility reasons, but I&=
#39;d cache them, and serve them statically after that, rather than regener=
ate the images on every single request. =A0You can accomplish that in the a=
pp itself, or just by throwing a caching proxy in front of it (maybe you=
9;re already doing this with perlbal)




I actually do cache generated pictures, I =
store them in a database table called 'bincache'. This way I do not=
have to compute and draw every image on the fly. If I have a picture in bi=
ncache, I serve it, and if I do not, I generate it and save it. That saves =
some CPU, but makes mysql work harder.


i



--0015174bec1c7944840473b4de24--

Re: Ways to scale a mod_perl site

am 16.09.2009 19:11:41 von mpeters

On 09/16/2009 12:13 PM, Brad Van Sickle wrote:

> Can I get you to explain this a little more? I don't see how this could
> be used for truly secure sites because I don't quite understand how
> storing a hash in a plain text cookie would be secure.

If you need to store per-session data about a client that the client
shouldn't be able to see, then you just encrypt that data, base-64
encode it and then put it into a cookie.

If you don't care if the user sees that information you just want to
make sure that they don't change it then add an extra secure hash of
that information to the cookie itself and then verify it when you
receive it.

I like to use JSON for my cookie data because it's simple and fast, but
any serializer should work. Something like this:

use JSON qw(to_json from_json);
use Digest::MD5 qw(md5_hex);
use MIME::Base64::URLSafe qw(urlsafe_b64encode urlsafe_b64decode);

# to generate the cookie
my %data = ( foo => 1, bar => 2, baz => 'frob' );
$data{secure} = generate_data_hash(\%data);
my $cookie = urlsafe_b64encode(to_json(\%data));
print "Cookie: $cookie\n";

# to process/validate the cookie
my $new_data = from_json(urlsafe_b64decode($cookie));
my $new_hash = delete $new_data->{secure};
if( $new_hash eq generate_data_hash($new_data) ) {
print "Cookie is ok!\n";
} else {
print "Cookie has been tampered with! Ignore.\n";
}

# very simple hash generation function
sub generate_data_hash {
my $data = shift;
my $secret = 'some configured secret';
return md5_hex($secret . join('|', map { "$_ - $data->{$_}" } keys
%$data));
}

Doing encryption and encoding on small bits of data (like cookies) in
memory will almost always be faster than having to hit the database
(especially if it's on another machine). But the biggest reason is that
it takes the load off the DB and puts it on the web machines which are
much easier to scale linearly.

> I know a lot of true app servers (Websphere, etc..) store
> this data in cached memory,

You could do the same with your session data, or even store it in a
shared resource like a BDB file. But unless it's available to all of
your web servers you're stuck with "sticky" sessions and that's a real
killer for performance/scalability.

--
Michael Peters
Plus Three, LP

Re: Ways to scale a mod_perl site

am 16.09.2009 19:13:56 von mpeters

On 09/16/2009 12:48 PM, Adam Prime wrote:

>> I have very little static content. Even images are generated. My site
>> generates images of math formulae such as (x-1)/(x+1) on the fly.,
>
> I can understand generating them on the fly for flexibility reasons, but
> I'd cache them, and serve them statically after that, rather than
> regenerate the images on every single request.

Definitely good advice. Especially if your images are generated the same
each time and never change. For instance, I don't think the image
generated by the formula "(x-1)/(x+1)" would ever change (unless you
changed your application code and in that case you can clear you cache).

--
Michael Peters
Plus Three, LP

Re: Ways to scale a mod_perl site

am 16.09.2009 19:15:56 von mpeters

On 09/16/2009 01:02 PM, Igor Chudov wrote:

> I actually do cache generated pictures, I store them in a database table
> called 'bincache'. This way I do not have to compute and draw every
> image on the fly. If I have a picture in bincache, I serve it, and if I
> do not, I generate it and save it. That saves some CPU, but makes mysql
> work harder.

Then don't put it in your database. A cache is not a permanent store and
it's usage patterns will be different than a database. I'd either use a
real cache like memcached or have your proxies cache them. In addition
to that you can send the appropriate HTTP cache headers so that browsers
themselves will never request that image again. Make the client machine
do the caching.

--
Michael Peters
Plus Three, LP

Re: Ways to scale a mod_perl site

am 16.09.2009 19:21:13 von Doug Sims

--000e0cd5c7c83cf0e70473b520f8
Content-Type: text/plain; charset=ISO-8859-1

I'm curious... what is the hardware like on the one server? How many CPUs
and RAM?

Also, a few thoughts...

- You do a 301 from algebra.com to www.algebra.com. That doesn't take much
work from the server, but why not just serve up everything from the original
location?

- The algebra problem I just tried returned twelve separate images. What
if, instead of serving gifs you displayed each stage of transformation of
the equation using HTML and CSS? That would be rather tricky with things
like root signs but I think it could be done - though a bit of work.

I wish this site had been around when I was in high school.



On Wed, Sep 16, 2009 at 11:48 AM, Adam Prime wrote:

> Igor Chudov wrote:
>
>>
>>
>> On Wed, Sep 16, 2009 at 11:05 AM, Michael Peters >> mpeters@plusthree.com>> wrote:
>>
>> On 09/16/2009 11:49 AM, Igor Chudov wrote:
>>
>> 1) Use a load balancer like perlbal (I am already doing that)
>>
>>
>> A load balancer is good but so are proxies. If you can separate your
>> application server from the server that servers static content then
>> you'll get a boost even if they are on the same machine.
>>
>>
>> I have very little static content. Even images are generated. My site
>> generates images of math formulae such as (x-1)/(x+1) on the fly.,
>>
>
> I can understand generating them on the fly for flexibility reasons, but
> I'd cache them, and serve them statically after that, rather than regenerate
> the images on every single request. You can accomplish that in the app
> itself, or just by throwing a caching proxy in front of it (maybe you're
> already doing this with perlbal)
>
> Adam
>

--000e0cd5c7c83cf0e70473b520f8
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I'm curious... what is the hardware like on the one server?=A0 How many=
CPUs and RAM?

Also, a few thoughts...

- You do a 301 from href=3D"http://algebra.com">algebra.com to ra.com">www.algebra.com.=A0 That doesn't take much work from the se=
rver, but why not just serve up everything from the original location?


- The algebra problem I just tried returned twelve separate images.=A0 =
What if, instead of serving gifs you displayed each stage of transformation=
of the equation using HTML and CSS?=A0 That would be rather tricky with th=
ings like root signs but I think it could be done - though a bit of work. r>

I wish this site had been around when I was in high school.


=

On Wed, Sep 16, 2009 at 11:48 AM, Adam Prime=
<adam.prime=
@utoronto.ca
>
wrote:

204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Igor Chudov wrote=
:


204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">




On Wed, Sep 16, 2009 at 11:05 AM, Michael Peters < ers@plusthree.com" target=3D"_blank">mpeters@plusthree.com <mailto:<=
a href=3D"mailto:mpeters@plusthree.com" target=3D"_blank">mpeters@plusthree=
..com>> wrote:




=A0 =A0On 09/16/2009 11:49 AM, Igor Chudov wrote:



=A0 =A0 =A0 =A01) Use a load balancer like perlbal (I am already doing tha=
t)





=A0 =A0A load balancer is good but so are proxies. If you can separate you=
r

=A0 =A0application server from the server that servers static content then=


=A0 =A0you'll get a boost even if they are on the same machine.





I have very little static content. Even images are generated. My site gener=
ates images of math formulae such as (x-1)/(x+1) on the fly.,




I can understand generating them on the fly for flexibility reasons, but I&=
#39;d cache them, and serve them statically after that, rather than regener=
ate the images on every single request. =A0You can accomplish that in the a=
pp itself, or just by throwing a caching proxy in front of it (maybe you=
9;re already doing this with perlbal)




Adam




--000e0cd5c7c83cf0e70473b520f8--

Re: Ways to scale a mod_perl site

am 16.09.2009 19:24:02 von Igor Chudov

--000325553b6e4c9e060473b52a81
Content-Type: text/plain; charset=ISO-8859-1

Guys, I completely love this discussion about cookies. You have really
enlightened me.

I think that letting users store cookie info in a manner that is secure
(involves both encryption and some form of authentication), instead of
storing them in a table, could possibly result in a very substantial
reduction of database use.

The cookie is

1) Encrypted string that I want and
2) MD5 of that string with a secret code appended that the users do not
know, which serves as a form of signing

That should work. I will not change it now, but will do if I get 2x more
traffic.

That way I would need zero hits to the database to handle my users sessions.


(I only retrieve account information when necessary)

As far as I remember now, I do not store much more information in a session
beyond username. (I hope that I am not wrong). So it should be easy.

Even now, I make sure that I reset the cookie table only every several
months. This way I would let users stay logged on forever.

Thanks a lot.

Igor

--000325553b6e4c9e060473b52a81
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Guys, I completely love this discussion about cookies. You have really enli=
ghtened me.

I think that letting users store cookie info in a manne=
r that is secure (involves both encryption and some form of authentication)=
, instead of storing them in a table, could possibly result in a very subst=
antial reduction of database use.


The cookie is

1) Encrypted string that I want and
2) MD5 of t=
hat string with a secret code appended that the users do not know, which se=
rves as a form of signing

That should work. I will not change it now=
, but will do if I get 2x more traffic.


That way I would need zero hits to the database to handle my users sess=
ions.

(I only retrieve account information when necessary)

A=
s far as I remember now, I do not store much more information in a session =
beyond username. (I hope that I am not wrong). So it should be easy.


Even now, I make sure that I reset the cookie table only every several =
months. This way I would let users stay logged on forever.

Thanks a=
lot.

Igor


--000325553b6e4c9e060473b52a81--

Re: Ways to scale a mod_perl site

am 16.09.2009 19:26:20 von Igor Chudov

--00032555b3928d8e580473b532ba
Content-Type: text/plain; charset=ISO-8859-1

On Wed, Sep 16, 2009 at 12:21 PM, Douglas Sims wrote:

> I'm curious... what is the hardware like on the one server? How many CPUs
> and RAM?
>
>
AMD Athlon quad core, running 32 bit Ubuntu Hardy. 16 GB of RAM. Algebra.Com
data is stored on an SSD>


> Also, a few thoughts...
>
> - You do a 301 from algebra.com to www.algebra.com. That doesn't take
> much work from the server, but why not just serve up everything from the
> original location?
>
>
then I will have to serve algebra.com twice to all search engines.


> - The algebra problem I just tried returned twelve separate images. What
> if, instead of serving gifs you displayed each stage of transformation of
> the equation using HTML and CSS? That would be rather tricky with things
> like root signs but I think it could be done - though a bit of work.
>
>
I rather like the way I do it, I let my site render images exactly how I
want, as opposed to letting browsers do it.



> I wish this site had been around when I was in high school.
>
>
>
thanks. I have some real math addicts on my site, who solved many thousands
of problems and helped hundreds of kids. I am glad to serve them.

i



>
> On Wed, Sep 16, 2009 at 11:48 AM, Adam Prime wrote:
>
>> Igor Chudov wrote:
>>
>>>
>>>
>>> On Wed, Sep 16, 2009 at 11:05 AM, Michael Peters >>> mpeters@plusthree.com>> wrote:
>>>
>>> On 09/16/2009 11:49 AM, Igor Chudov wrote:
>>>
>>> 1) Use a load balancer like perlbal (I am already doing that)
>>>
>>>
>>> A load balancer is good but so are proxies. If you can separate your
>>> application server from the server that servers static content then
>>> you'll get a boost even if they are on the same machine.
>>>
>>>
>>> I have very little static content. Even images are generated. My site
>>> generates images of math formulae such as (x-1)/(x+1) on the fly.,
>>>
>>
>> I can understand generating them on the fly for flexibility reasons, but
>> I'd cache them, and serve them statically after that, rather than regenerate
>> the images on every single request. You can accomplish that in the app
>> itself, or just by throwing a caching proxy in front of it (maybe you're
>> already doing this with perlbal)
>>
>> Adam
>>
>
>

--00032555b3928d8e580473b532ba
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable



On Wed, Sep 16, 2009 at 12:21 PM, Dougla=
s Sims <ratsbane=
@gmail.com
>
wrote:
=3D"border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; p=
adding-left: 1ex;">
I'm curious... what is the hardware like on the one server?=A0 How many=
CPUs and RAM?


AMD Athlon quad core, running 3=
2 bit Ubuntu Hardy. 16 GB of RAM. Algebra.Com data is stored on an SSD> =


=A0
rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Also, a =
few thoughts...

- You do a 301 from arget=3D"_blank">algebra.com to et=3D"_blank">www.algebra.com.=A0 That doesn't take much work from =
the server, but why not just serve up everything from the original location=
?




then I will have to serve ra.com">algebra.com twice to all search engines.
=A0
e class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, 204, 204);=
margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
- The algebra problem I just tried returned twelve separate images.=A0 What=
if, instead of serving gifs you displayed each stage of transformation of =
the equation using HTML and CSS?=A0 That would be rather tricky with things=
like root signs but I think it could be done - though a bit of work.




I rather like the way I do it, I let my site rend=
er images exactly how I want, as opposed to letting browsers do it.
>=A0
rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
I wish this site had been around when I was in high school.


ockquote>

thanks. I have some real math addicts on my site, who sol=
ved many thousands of problems and helped hundreds of kids. I am glad to se=
rve them.


i

=A0
t: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1=
ex;">
On Wed, Sep 16, 2009 =
at 11:48 AM, Adam Prime < utoronto.ca" target=3D"_blank">adam.prime@utoronto.ca> wrote:=



tyle=3D"border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8e=
x; padding-left: 1ex;">Igor Chudov wrote:


204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">




On Wed, Sep 16, 2009 at 11:05 AM, Michael Peters < ers@plusthree.com" target=3D"_blank">mpeters@plusthree.com <mailto:<=
a href=3D"mailto:mpeters@plusthree.com" target=3D"_blank">mpeters@plusthree=
..com>> wrote:





=A0 =A0On 09/16/2009 11:49 AM, Igor Chudov wrote:



=A0 =A0 =A0 =A01) Use a load balancer like perlbal (I am already doing tha=
t)





=A0 =A0A load balancer is good but so are proxies. If you can separate you=
r

=A0 =A0application server from the server that servers static content then=


=A0 =A0you'll get a boost even if they are on the same machine.





I have very little static content. Even images are generated. My site gener=
ates images of math formulae such as (x-1)/(x+1) on the fly.,




I can understand generating them on the fly for flexibility reasons, but I&=
#39;d cache them, and serve them statically after that, rather than regener=
ate the images on every single request. =A0You can accomplish that in the a=
pp itself, or just by throwing a caching proxy in front of it (maybe you=
9;re already doing this with perlbal)





Adam






--00032555b3928d8e580473b532ba--

Re: Ways to scale a mod_perl site

am 16.09.2009 20:12:34 von Perrin Harkins

On Wed, Sep 16, 2009 at 11:49 AM, Igor Chudov wrote:
> Any thoughts?

In addition to the good advice you're getting on the thread, here are
some books you might find useful:

- Practical mod_perl -- http://modperlbook.org/ -- is old, but has a
lot of general architecture and tuning advice that really hasn't
changed much since then.

- High-Performance MySQL, the best book available on MySQL tuning.

- Building Scalable Websites, which is about PHP sites, but has good
food for thought.

- Scalable Internet Architectures, a book that is more about general
principles to apply to the problem.

And, the most important piece of advice: Devel::NYTProf.

Happy tuning,
Perrin

Re: Ways to scale a mod_perl site

am 16.09.2009 20:23:33 von Igor Chudov

--00032555a89a2a04390473b5ff49
Content-Type: text/plain; charset=ISO-8859-1

Perrin, thanks a lot. I bought all books recommended below. Should be a good
read.

I want to be ready when the need arises, and I do not want to do anything
stupid in the meantime that would make me not scalable.

Again, thank you.

Igor

On Wed, Sep 16, 2009 at 1:12 PM, Perrin Harkins wrote:

> On Wed, Sep 16, 2009 at 11:49 AM, Igor Chudov wrote:
> > Any thoughts?
>
> In addition to the good advice you're getting on the thread, here are
> some books you might find useful:
>
> - Practical mod_perl -- http://modperlbook.org/ -- is old, but has a
> lot of general architecture and tuning advice that really hasn't
> changed much since then.
>
> - High-Performance MySQL, the best book available on MySQL tuning.
>
> - Building Scalable Websites, which is about PHP sites, but has good
> food for thought.
>
> - Scalable Internet Architectures, a book that is more about general
> principles to apply to the problem.
>
> And, the most important piece of advice: Devel::NYTProf.
>
> Happy tuning,
> Perrin
>

--00032555a89a2a04390473b5ff49
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Perrin, thanks a lot. I bought all books recommended below. Should be a goo=
d read.

I want to be ready when the need arises, and I do not want t=
o do anything stupid in the meantime that would make me not scalable.


Again, thank you.

Igor

On Wed,=
Sep 16, 2009 at 1:12 PM, Perrin Harkins < ailto:pharkins@gmail.com">pharkins@gmail.com> wrote:
kquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, 204, =
204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
On Wed, Sep 16, 2009 at 11:49 AM, Igor Chudov < @gmail.com">ichudov@gmail.com> wrote:

> Any thoughts?



In addition to the good advice you're getting on the thread, here are r>
some books you might find useful:



- Practical mod_perl -- k">http://modperlbook.org/ -- is old, but has a

lot of general architecture and tuning advice that really hasn't

changed much since then.



- High-Performance MySQL, the best book available on MySQL tuning.



- Building Scalable Websites, which is about PHP sites, but has good

food for thought.



- Scalable Internet Architectures, a book that is more about general

principles to apply to the problem.



And, the most important piece of advice: Devel::NYTProf.



Happy tuning,

Perrin




--00032555a89a2a04390473b5ff49--

Re: Ways to scale a mod_perl site

am 16.09.2009 21:40:17 von Scott Gifford

Igor Chudov writes:

> My algebra.com server serves about 77k pageviews and a little over a million
> objects requests per day (with half of it being served in just 4 hours). I peak
> out at 35 requests per second currently.

Some high-level advice: Profile everything you can to see where your
bottlenecks are. If you don't have bottlenecks, simulate enough load
that you do. I am frequently surprised by what turn out to be the
slow parts of my code.

At a high-level, you can use tools like top, vmstat, iostat, iotop,
etc. to check whether it's CPU, memory, or disk space that you're
running out of first.

For CPU, you can use top to see which process is using most of your
CPU, the database, app, or something else.

Inside your app, you can use Perl's profiling tools to see which parts
of your app need to be sped up.

Hope this is helpful!

----Scott.

Re: Ways to scale a mod_perl site

am 17.09.2009 07:57:35 von Jeff Peng

How many servers?
We have run the systems with about 500 million PV each day, with many squid boxes + 200 apache webservers + 200 mysql hosts.
The applications were written with FastCGI.

-----Original Message-----

From: Igor Chudov

Sent: Sep 16, 2009 11:49 AM

To: Mod_Perl

Subject: Ways to scale a mod_perl site



My algebra.com server serves about 77k pageviews and a little over a million objects requests per day (with half of it being served in just 4 hours). I peak out at 35 requests per second currently.


I use mod_perl, mysql, and perlbal with everything running on one server.

The server has a solid state disk to hold mysql data.

I believe that it can handle 3x-5x more traffic all by itself. However, I am thinking of ways to scale up a mod_perl installation.


1) Use a load balancer like perlbal (I am already doing that)
2) Separate a MySQL database server from webservers.
3) Being enabled by item 2, add more webservers and balancers
4) Create a separate database for cookie data (Apache::Session objects) ??? -- not sure if good idea --


(next level)

5) Use a separate database handle for readonly database requests (SELECT), as opposed to INSERTS and UPDATEs. Use replication to access multiple slave servers for read only data, and only access the master for INSERT and UPDATE and DELETE.


Any thoughts?

Re: Ways to scale a mod_perl site

am 17.09.2009 09:43:50 von Cosimo Streppone

Jeff Peng wrote:

> How many servers?
> We have run the systems with about 500 million PV each day, with many
> squid boxes + 200 apache webservers + 200 mysql hosts.
> The applications were written with FastCGI.

Wow! Why don't you tell or blog a bit about this?
I would love to know more about what challenges
you went through.

Maybe someone else has also stories to tell.

At Opera Software, our team works on a social network website,
and we currently serve 2 million page views per day peak,
with something around 700,000 unique visitors per day peak,
and ~120M hits per day with under 20 servers.

Those include database servers, apache fronts,
mod_perl backends, varnish/memcached caches, upload servers,
cronjobs/mail, etc...

For the curious:
http://www.slideshare.net/cstrep/myoperacom-scalability-v20

And yes, we're looking for ways to optimize/scale better
our application, since we're growing more and more... :-)

--
Cosimo

Re: Ways to scale a mod_perl site

am 17.09.2009 09:48:25 von Cosimo Streppone

In data 17 september 2009 alle ore 09:43:50, Cosimo Streppone
ha scritto:

> Jeff Peng wrote:
>
>> How many servers?
>> We have run the systems with about 500 million PV each day, with many
>> squid boxes + 200 apache webservers + 200 mysql hosts.
>> The applications were written with FastCGI.
>
> Wow! Why don't you tell or blog a bit about this?
>
> [...]
> and we currently serve [...]

Mmh, I just re-read that, and I realized it may sound
like: "Wow, look at how cool we are!".

It wasn't meant to sound like that, obviously,
but more like: here's our experience. We'd like to confront
with others on this list.

:-)

--
Cosimo

Re: Ways to scale a mod_perl site

am 17.09.2009 10:12:11 von Jeff Peng

-----Original Message-----
>From: Cosimo Streppone
>Sent: Sep 17, 2009 3:43 AM
>To: Mod_perl users
>Cc: Jeff Peng
>Subject: Re: Ways to scale a mod_perl site
>
>Jeff Peng wrote:
>
>> How many servers?
>> We have run the systems with about 500 million PV each day, with many
>> squid boxes + 200 apache webservers + 200 mysql hosts.
>> The applications were written with FastCGI.
>
>Wow! Why don't you tell or blog a bit about this?
>I would love to know more about what challenges
>you went through.
>


Yup, at that time the primary pressure against performance was database.
We used distributed Mysql servers with an oracle index server.
Each mysql host served 1 - 1.5 million users.
When an user logined, the application queried oracle to get the mysql host id with the key of username.
Then the application queried to mysql and got anything it wanted.
The systems generated 2T data each day (surely we had large amount of store).

The front apache servers with FastCGI were running heavily, I remember 8G memory were almost eated.
Squid was useful for static resources, but for dynamic applications like CGI, no way to reduce the pressure but adding more machines.

Last, the applications are webmail, the best popolar provider here.


Regards,
Jeff Peng

Re: Ways to scale a mod_perl site

am 17.09.2009 13:26:32 von James Smith

Igor Chudov wrote:
> Guys, I completely love this discussion about cookies. You have really
> enlightened me.
>
> I think that letting users store cookie info in a manner that is secure
> (involves both encryption and some form of authentication), instead of
> storing them in a table, could possibly result in a very substantial
> reduction of database use.
>
>
Alternatively store the information in a two level cache!
memcached/database - with
write through - then most of the time you get the data from memcached -
you can do
the same with the images...

write entry: -> write data to memcached ; write data to sql cache

read entry: -> read data from memcached and return OR
read data from sql cache and write to memcached
and return

Should avoid most database reads! works well for the images you create
to minimize
database accesses
> The cookie is
>
> 1) Encrypted string that I want and
> 2) MD5 of that string with a secret code appended that the users do not
> know, which serves as a form of signing
>
> That should work. I will not change it now, but will do if I get 2x more
> traffic.
>
> That way I would need zero hits to the database to handle my users sessions.
>
>
> (I only retrieve account information when necessary)
>
> As far as I remember now, I do not store much more information in a session
> beyond username. (I hope that I am not wrong). So it should be easy.
>
> Even now, I make sure that I reset the cookie table only every several
> months. This way I would let users stay logged on forever.
>
> Thanks a lot.
>
> Igor
>
>




--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

Re: Ways to scale a mod_perl site

am 17.09.2009 22:10:44 von Phil Van

Just curious: since you are already running FastCGI, why not serving
dynamic contents directly via it? Also, you may eliminate Squid. Using
Apache for static content is good enough (easy to get 5k static PV per
second per server, or 400 millions per day).


Phil



On 9/17/09, Jeff Peng wrote:
>
>
> -----Original Message-----
>>From: Cosimo Streppone
>>Sent: Sep 17, 2009 3:43 AM
>>To: Mod_perl users
>>Cc: Jeff Peng
>>Subject: Re: Ways to scale a mod_perl site
>>
>>Jeff Peng wrote:
>>
>>> How many servers?
>>> We have run the systems with about 500 million PV each day, with many
>>> squid boxes + 200 apache webservers + 200 mysql hosts.
>>> The applications were written with FastCGI.
>>
>>Wow! Why don't you tell or blog a bit about this?
>>I would love to know more about what challenges
>>you went through.
>>
>
>
> Yup, at that time the primary pressure against performance was database.
> We used distributed Mysql servers with an oracle index server.
> Each mysql host served 1 - 1.5 million users.
> When an user logined, the application queried oracle to get the mysql host
> id with the key of username.
> Then the application queried to mysql and got anything it wanted.
> The systems generated 2T data each day (surely we had large amount of
> store).
>
> The front apache servers with FastCGI were running heavily, I remember 8G
> memory were almost eated.
> Squid was useful for static resources, but for dynamic applications like
> CGI, no way to reduce the pressure but adding more machines.
>
> Last, the applications are webmail, the best popolar provider here.
>
>
> Regards,
> Jeff Peng
>

Re: Ways to scale a mod_perl site

am 17.09.2009 23:23:14 von torsten.foertsch

On Wed 16 Sep 2009, Igor Chudov wrote:
> >> I have very little static content. Even images are generated. My
> >> site generates images of math formulae such as (x-1)/(x+1) on the
> >> fly.,
> >
> > I can understand generating them on the fly for flexibility
> > reasons, but I'd cache them, and serve them statically after that,
> > rather than regenerate the images on every single request. =A0You can
> > accomplish that in the app itself, or just by throwing a caching
> > proxy in front of it (maybe you're already doing this with perlbal)
>
> I actually do cache generated pictures, I store them in a database
> table called 'bincache'. This way I do not have to compute and draw
> every image on the fly. If I have a picture in bincache, I serve it,
> and if I do not, I generate it and save it. That saves some CPU, but
> makes mysql work harder.

I'd go for Apache's mod_cache + mod_disk_cache. The only thing you have=20
to do is to set cache control headers. Mod_cache is really fast b/c it=20
skips almost all of the http request cycle. And in your case it takes=20
load from the database. The request won't even hit mod_perl.

Torsten

=2D-=20
Need professional mod_perl support?
Just hire me: torsten.foertsch@gmx.net

Re: Ways to scale a mod_perl site

am 18.09.2009 00:07:17 von David Nicol

On Thu, Sep 17, 2009 at 4:23 PM, Torsten Foertsch
wrote:

> I'd go for Apache's mod_cache + mod_disk_cache. The only thing you have
> to do is to set cache control headers. Mod_cache is really fast b/c it
> skips almost all of the http request cycle. And in your case it takes
> load from the database. The request won't even hit mod_perl.
>
> Torsten

it seems like an equivalent way to do this possibly with less
configuration would be to generate the cacheables with file names
representing their input parameters and do the construction of the new
ones with a custom 404 handler. TMTOWTDI.



--
"As if you could kill time without injuring eternity!" -- Henry David Thoreau

Re: Ways to scale a mod_perl site

am 18.09.2009 03:58:22 von Jeff Peng

-----Original Message-----
>From: Phil Van
>Sent: Sep 18, 2009 4:10 AM
>To: Jeff Peng
>Cc: modperl-list
>Subject: Re: Ways to scale a mod_perl site
>
>Just curious: since you are already running FastCGI, why not serving
>dynamic contents directly via it?

we needed some reverse proxies for CDN.
for example, our primary webservers were in ISP A, while in ISP B, we put some squid as reverse proxies to serve the users in local ISP.


>Also, you may eliminate Squid. Using
>Apache for static content is good enough (easy to get 5k static PV per
>second per server, or 400 millions per day).
>

No. I'm sure serving static content Apache is worse than squid.
when I was in another department, I maintained the systems for AD union (like google's AD).
all content are static, PV of each day was about 200 million.
but we had less than 20 squid (IIRC it was 18) boxes for handling this amount of requests.
the same amount of Apache couldn't handle that case.



Regards,
Jeff Peng

Re: Ways to scale a mod_perl site

am 18.09.2009 10:42:30 von Tina Mueller

On Wed, 16 Sep 2009, Igor Chudov wrote:

> On Wed, Sep 16, 2009 at 11:05 AM, Michael Peters wrote:
>
>> Reducing DB usage is more important than this. Also, before you go down
>> that road you should look at adding a caching layer to your application
>> (memcached is a popular choice).
>>
>>
> It is not going to be that helpful due to dynamic content. (which is my
> site's advantage). But this may be useful for other applications.

That's a common misconception, I think. Even if a website is completely
dynamic you can cache things.
First misunderstanding: people think about caching a whole HTML page.
In memcached you typically cache data structures.
As an example I will take my portal software. It has a forum, blog,
guestbook, it has a list of users who are online, the forum has a
"posts from the last 24 hours", and it has other things that are
shown with every request (new private messages, notes for moderators
about new forum threads, ...)

Now, Should I fetch the online users from the database with every request?
Should I fetch all the threads and authors of the last 24 hours whenever
somebody requests that page, even if nothing has changed?

First answer: the list of online users can be cached for, say 1 minute.
Nobody will care or even notice. Only if someone logs in you will expire
the entry in memcached. All other changes are not so important that
you cannot cache them for one single minute.

Make that 1 page view per second and you save 59 database requests per
minute.


Second: The list of the recent posts can be cached, let's say for 3 minutes.
The entry in memcached is only expired explicitly when somebody posts a new
thread/reply or a title is changed etc.

I believe *every* application has things like that which you can cache.
I know of a website that reduced its load dramatically when using
memcached. It's quite a big webseite (300 million Page views per month
locally (mostly requests from german speaking countries)).
But some people are reluctant to use memcached. One person said to me
"what if storing the data in memcached is more work than fetching it
from the database every time?" I don't know what to say. Try it out.
I know the example of the company who uses memcached.
Did you know faceboook uses memcached very very extensively?
If you're not sure: analyse your website usage. What kind of data
is fetched how often. Make a testcase and use memcached for that and
see what's faster.


Another thing you could do is: seperate your database schema.
Some tables do not connect to others. For example, my portal software
is modular, so that you can activate/deactive certain modules.
The easiest thing was to just create one (DBIx::Class) schema per
module. Of course, what connects all these is the user schema, and
because I cannot do joins to the user tables any more I might have
a request more here and there. But I can sepereate all these schemas
and put them all on their own database server. With every request,
you only need a part of all the schemas.
Typically the highest load is on the database so splitting the db to several
servers like this might be an option.

And last but not least: for searching the database, use a search engine.
KinoSearch works quite well, and there are also other search engines for perl.

regards,
tina

Re: Ways to scale a mod_perl site

am 18.09.2009 14:21:05 von Jeff Peng

-----Original Message-----
>From: Brad Van Sickle
>Sent: Sep 17, 2009 12:13 AM
>To: Michael Peters
>Cc: Mod_Perl
>Subject: Re: Ways to scale a mod_perl site

> but I was unaware that there might be
>an option for doing this without using a DB with mod_perl .

As Tina said, how about using memcached for this case?

Regards,
Jeff Peng

Re: Ways to scale a mod_perl site

am 18.09.2009 14:24:08 von Matthew Paluch

--0016367f9fb8767baf0473d9356f
Content-Type: text/plain; charset=ISO-8859-1

While many great minds are here, I would like to focus on one point for a
moment, which in my experience, has been the most critical:

The database

Before I were to ask any other of your questions (all of which were valid),
I would ask myself:

- What kind of database tables am I implementing? (innodb, berkley,etc)
What effect do they have on the filesystejm, or the pagefile?
- How have I defined connections, connection pooling, shared resources,
partitions v logical drives, semaphores v shared memory handles to handles?
- Have I analyzed which tables actually get used, and by which processes,
and paid attention to which operations only require simple foreign to
primary key relationships, and not complex JOINS?

And secondarily:

- Is it possible to set up simple READ-ONLY copies of frequently read but
rarely changed data (such as login information) so that some work could be
off-loaded in an intelligent manner without regards to load balancing?

In short, the database's interaction with the main application is most
commonly the issue, regardless of the underlying technologies. Start there,
so says I.

Matthew P
gedanken

--
Matthew Paluch
404.375.8898

--0016367f9fb8767baf0473d9356f
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable


While many great minds are here, I would like to focus =
on one point for a moment, which in my experience, has been the most critic=
al:

The database

Before I=
were to ask any other of your questions (all of which were valid), I would=
ask myself:


=A0- What kind of database tables am I implementing? (i=
nnodb, berkley,etc) =A0What effect do they have on the filesystejm, or the =
pagefile?
=A0- How have I defined connections, connection pooling=
, shared resources, partitions v logical drives, semaphores v shared memory=
handles to handles?

=A0- Have I analyzed which tables actually get used, and by which proc=
esses, and paid attention to which operations only require simple foreign t=
o primary key relationships, and not complex JOINS?

v>
And secondarily:

=A0- Is it possible to set up sim=
ple READ-ONLY copies of frequently read but rarely changed data (such as lo=
gin information) so that some work could be off-loaded in an intelligent ma=
nner without regards to load balancing?


In short, the database's interaction with the main =
application is most commonly the issue, regardless of the underlying techno=
logies. =A0Start there, so says I.

Matthew P

gedanken

--
Matthew Paluch
404.375.8898


--0016367f9fb8767baf0473d9356f--

Re: Ways to scale a mod_perl site

am 18.09.2009 15:06:36 von Igor Chudov

--0015175cd9625718ff0473d9cd77
Content-Type: text/plain; charset=ISO-8859-1

Michael, you inspired me to reimplement cookies this way. For my site, the
cookie table is the most frequently updated one (even though I do not grant
cookies to search engines). I will try to use this kind of implementation.

Even now, my users like the fact that they can stay signed on forever, but
now I can do it at no cost to myself.

A quick question, is there an existing perl module to do this sort of thing?

Igor

On Wed, Sep 16, 2009 at 12:11 PM, Michael Peters wrote:

> On 09/16/2009 12:13 PM, Brad Van Sickle wrote:
>
> Can I get you to explain this a little more? I don't see how this could
>> be used for truly secure sites because I don't quite understand how
>> storing a hash in a plain text cookie would be secure.
>>
>
> If you need to store per-session data about a client that the client
> shouldn't be able to see, then you just encrypt that data, base-64 encode it
> and then put it into a cookie.
>
> If you don't care if the user sees that information you just want to make
> sure that they don't change it then add an extra secure hash of that
> information to the cookie itself and then verify it when you receive it.
>
> I like to use JSON for my cookie data because it's simple and fast, but any
> serializer should work. Something like this:
>
> use JSON qw(to_json from_json);
> use Digest::MD5 qw(md5_hex);
> use MIME::Base64::URLSafe qw(urlsafe_b64encode urlsafe_b64decode);
>
> # to generate the cookie
> my %data = ( foo => 1, bar => 2, baz => 'frob' );
> $data{secure} = generate_data_hash(\%data);
> my $cookie = urlsafe_b64encode(to_json(\%data));
> print "Cookie: $cookie\n";
>
> # to process/validate the cookie
> my $new_data = from_json(urlsafe_b64decode($cookie));
> my $new_hash = delete $new_data->{secure};
> if( $new_hash eq generate_data_hash($new_data) ) {
> print "Cookie is ok!\n";
> } else {
> print "Cookie has been tampered with! Ignore.\n";
> }
>
> # very simple hash generation function
> sub generate_data_hash {
> my $data = shift;
> my $secret = 'some configured secret';
> return md5_hex($secret . join('|', map { "$_ - $data->{$_}" } keys
> %$data));
> }
>
> Doing encryption and encoding on small bits of data (like cookies) in
> memory will almost always be faster than having to hit the database
> (especially if it's on another machine). But the biggest reason is that it
> takes the load off the DB and puts it on the web machines which are much
> easier to scale linearly.
>
> > I know a lot of true app servers (Websphere, etc..) store
>
>> this data in cached memory,
>>
>
> You could do the same with your session data, or even store it in a shared
> resource like a BDB file. But unless it's available to all of your web
> servers you're stuck with "sticky" sessions and that's a real killer for
> performance/scalability.
>
>
> --
> Michael Peters
> Plus Three, LP
>

--0015175cd9625718ff0473d9cd77
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Michael, you inspired me to reimplement cookies this way. For my site, the =
cookie table is the most frequently updated one (even though I do not grant=
cookies to search engines). I will try to use this kind of implementation.=



Even now, my users like the fact that they can stay signed=A0 on foreve=
r, but now I can do it at no cost to myself.

A quick question, is th=
ere an existing perl module to do this sort of thing?

Igor


On Wed, Sep 16, 2009 at 12:11 PM, Michael Peters=
<mpeters@plu=
sthree.com
>
wrote:
=3D"border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; p=
adding-left: 1ex;">
On 09/16/2009 12:13 PM, Brad Van Sickle wrote:



204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Can I get you to explain this a little more? I don't see how this could=


be used for truly secure sites because I don't quite understand how

storing a hash in a plain text cookie would be secure.




If you need to store per-session data about a client that the client should=
n't be able to see, then you just encrypt that data, base-64 encode it =
and then put it into a cookie.



If you don't care if the user sees that information you just want to ma=
ke sure that they don't change it then add an extra secure hash of that=
information to the cookie itself and then verify it when you receive it. r>



I like to use JSON for my cookie data because it's simple and fast, but=
any serializer should work. Something like this:



use JSON qw(to_json from_json);

use Digest::MD5 qw(md5_hex);

use MIME::Base64::URLSafe qw(urlsafe_b64encode urlsafe_b64decode);



# to generate the cookie

my %data =3D ( foo =3D> 1, bar =3D> 2, baz =3D> 'frob' );<=
br>
$data{secure} =3D generate_data_hash(\%data);

my $cookie =3D urlsafe_b64encode(to_json(\%data));

print "Cookie: $cookie\n";



# to process/validate the cookie

my $new_data =3D from_json(urlsafe_b64decode($cookie));

my $new_hash =3D delete $new_data->{secure};

if( $new_hash eq generate_data_hash($new_data) ) {

=A0 =A0print "Cookie is ok!\n";

} else {

=A0 =A0print "Cookie has been tampered with! Ignore.\n";

}



# very simple hash generation function

sub generate_data_hash {

=A0 =A0my $data =3D shift;

=A0 =A0my $secret =3D 'some configured secret';

=A0 =A0return md5_hex($secret . join('|', map { "$_ - $data-&=
gt;{$_}" } keys %$data));

}



Doing encryption and encoding on small bits of data (like cookies) in memor=
y will almost always be faster than having to hit the database (especially =
if it's on another machine). But the biggest reason is that it takes th=
e load off the DB and puts it on the web machines which are much easier to =
scale linearly.





> I know a lot of true app servers (Websphere, etc..) store

204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
this data in cached memory,




You could do the same with your session data, or even store it in a shared =
resource like a BDB file. But unless it's available to all of your web =
servers you're stuck with "sticky" sessions and that's a =
real killer for performance/scalability.





--

Michael Peters

Plus Three, LP




--0015175cd9625718ff0473d9cd77--

Re: Ways to scale a mod_perl site

am 18.09.2009 15:12:05 von Fayland

This?
http://search.cpan.org/~jkrasnoo/ApacheCookieEncrypted-0.03/ Encrypted.pm

Catalyst has a plugin:
http://search.cpan.org/~lbrocard/Catalyst-Plugin-CookiedSess ion-0.35/lib/Ca=
talyst/Plugin/CookiedSession.pm

Thanks.

On Fri, Sep 18, 2009 at 9:06 PM, Igor Chudov wrote:
> Michael, you inspired me to reimplement cookies this way. For my site, th=
e
> cookie table is the most frequently updated one (even though I do not gra=
nt
> cookies to search engines). I will try to use this kind of implementation=
..
>
> Even now, my users like the fact that they can stay signed  on forev=
er, but
> now I can do it at no cost to myself.
>
> A quick question, is there an existing perl module to do this sort of thi=
ng?
>
> Igor
>
> On Wed, Sep 16, 2009 at 12:11 PM, Michael Peters
> wrote:
>>
>> On 09/16/2009 12:13 PM, Brad Van Sickle wrote:
>>
>>> Can I get you to explain this a little more? I don't see how this could
>>> be used for truly secure sites because I don't quite understand how
>>> storing a hash in a plain text cookie would be secure.
>>
>> If you need to store per-session data about a client that the client
>> shouldn't be able to see, then you just encrypt that data, base-64 encod=
e it
>> and then put it into a cookie.
>>
>> If you don't care if the user sees that information you just want to mak=
e
>> sure that they don't change it then add an extra secure hash of that
>> information to the cookie itself and then verify it when you receive it.
>>
>> I like to use JSON for my cookie data because it's simple and fast, but
>> any serializer should work. Something like this:
>>
>> use JSON qw(to_json from_json);
>> use Digest::MD5 qw(md5_hex);
>> use MIME::Base64::URLSafe qw(urlsafe_b64encode urlsafe_b64decode);
>>
>> # to generate the cookie
>> my %data =3D ( foo =3D> 1, bar =3D> 2, baz =3D> 'frob' );
>> $data{secure} =3D generate_data_hash(\%data);
>> my $cookie =3D urlsafe_b64encode(to_json(\%data));
>> print "Cookie: $cookie\n";
>>
>> # to process/validate the cookie
>> my $new_data =3D from_json(urlsafe_b64decode($cookie));
>> my $new_hash =3D delete $new_data->{secure};
>> if( $new_hash eq generate_data_hash($new_data) ) {
>>    print "Cookie is ok!\n";
>> } else {
>>    print "Cookie has been tampered with! Ignore.\n";
>> }
>>
>> # very simple hash generation function
>> sub generate_data_hash {
>>    my $data =3D shift;
>>    my $secret =3D 'some configured secret';
>>    return md5_hex($secret . join('|', map { "$_ - $data->{$_}"=
} keys
>> %$data));
>> }
>>
>> Doing encryption and encoding on small bits of data (like cookies) in
>> memory will almost always be faster than having to hit the database
>> (especially if it's on another machine). But the biggest reason is that =
it
>> takes the load off the DB and puts it on the web machines which are much
>> easier to scale linearly.
>>
>> > I know a lot of true app servers (Websphere, etc..) store
>>>
>>> this data in cached memory,
>>
>> You could do the same with your session data, or even store it in a shar=
ed
>> resource like a BDB file. But unless it's available to all of your web
>> servers you're stuck with "sticky" sessions and that's a real killer for
>> performance/scalability.
>>
>> --
>> Michael Peters
>> Plus Three, LP
>
>



--=20
Fayland Lam // http://www.fayland.org/

Re: Ways to scale a mod_perl site

am 18.09.2009 15:21:57 von Igor Chudov

--00032555b392382c4a0473da04b7
Content-Type: text/plain; charset=ISO-8859-1

On Fri, Sep 18, 2009 at 8:12 AM, Fayland Lam wrote:

> This?
> http://search.cpan.org/~jkrasnoo/ApacheCookieEncrypted-0.03/ Encrypted.pm
>
> Catalyst has a plugin:
>
> http://search.cpan.org/~lbrocard/Catalyst-Plugin-CookiedSess ion-0.35/lib/Catalyst/Plugin/CookiedSession.pm
>
> This module seems to want libapreq.1-34, which I interpret as not being
compatible with mod_perl 2?

I tried installing it with CPAN on Ubuntu Jaunty and failed.

CPAN.pm: Going to build I/IS/ISAAC/libapreq-1.34.tar.gz

Please install mod_perl: 1.25 < version < 1.99
(Can't locate mod_perl.pm in @INC (@INC contains: /root/misc/life/modules
/root/lisleelectric.com /etc/perl /usr/local/lib/perl/5.10.0
/usr/local/share/perl/5.10.0 /usr/lib/perl5 /usr/share/perl5
/usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl .) at
Makefile.PL line 7.
) at Makefile.PL line 8.
BEGIN failed--compilation aborted at Makefile.PL line 18.
Warning: No success on command[/usr/bin/perl Makefile.PL INSTALLDIRS=site]
Warning (usually harmless): 'YAML' not installed, will not store persistent
state
ISAAC/libapreq-1.34.tar.gz
/usr/bin/perl Makefile.PL INSTALLDIRS=site -- NOT OK
Running make test

Igor


> Thanks.
>
> On Fri, Sep 18, 2009 at 9:06 PM, Igor Chudov wrote:
> > Michael, you inspired me to reimplement cookies this way. For my site,
> the
> > cookie table is the most frequently updated one (even though I do not
> grant
> > cookies to search engines). I will try to use this kind of
> implementation.
> >
> > Even now, my users like the fact that they can stay signed on forever,
> but
> > now I can do it at no cost to myself.
> >
> > A quick question, is there an existing perl module to do this sort of
> thing?
> >
> > Igor
> >
> > On Wed, Sep 16, 2009 at 12:11 PM, Michael Peters
> > wrote:
> >>
> >> On 09/16/2009 12:13 PM, Brad Van Sickle wrote:
> >>
> >>> Can I get you to explain this a little more? I don't see how this could
> >>> be used for truly secure sites because I don't quite understand how
> >>> storing a hash in a plain text cookie would be secure.
> >>
> >> If you need to store per-session data about a client that the client
> >> shouldn't be able to see, then you just encrypt that data, base-64
> encode it
> >> and then put it into a cookie.
> >>
> >> If you don't care if the user sees that information you just want to
> make
> >> sure that they don't change it then add an extra secure hash of that
> >> information to the cookie itself and then verify it when you receive it.
> >>
> >> I like to use JSON for my cookie data because it's simple and fast, but
> >> any serializer should work. Something like this:
> >>
> >> use JSON qw(to_json from_json);
> >> use Digest::MD5 qw(md5_hex);
> >> use MIME::Base64::URLSafe qw(urlsafe_b64encode urlsafe_b64decode);
> >>
> >> # to generate the cookie
> >> my %data = ( foo => 1, bar => 2, baz => 'frob' );
> >> $data{secure} = generate_data_hash(\%data);
> >> my $cookie = urlsafe_b64encode(to_json(\%data));
> >> print "Cookie: $cookie\n";
> >>
> >> # to process/validate the cookie
> >> my $new_data = from_json(urlsafe_b64decode($cookie));
> >> my $new_hash = delete $new_data->{secure};
> >> if( $new_hash eq generate_data_hash($new_data) ) {
> >> print "Cookie is ok!\n";
> >> } else {
> >> print "Cookie has been tampered with! Ignore.\n";
> >> }
> >>
> >> # very simple hash generation function
> >> sub generate_data_hash {
> >> my $data = shift;
> >> my $secret = 'some configured secret';
> >> return md5_hex($secret . join('|', map { "$_ - $data->{$_}" } keys
> >> %$data));
> >> }
> >>
> >> Doing encryption and encoding on small bits of data (like cookies) in
> >> memory will almost always be faster than having to hit the database
> >> (especially if it's on another machine). But the biggest reason is that
> it
> >> takes the load off the DB and puts it on the web machines which are much
> >> easier to scale linearly.
> >>
> >> > I know a lot of true app servers (Websphere, etc..) store
> >>>
> >>> this data in cached memory,
> >>
> >> You could do the same with your session data, or even store it in a
> shared
> >> resource like a BDB file. But unless it's available to all of your web
> >> servers you're stuck with "sticky" sessions and that's a real killer for
> >> performance/scalability.
> >>
> >> --
> >> Michael Peters
> >> Plus Three, LP
> >
> >
>
>
>
> --
> Fayland Lam // http://www.fayland.org/
>

--00032555b392382c4a0473da04b7
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable



On Fri, Sep 18, 2009 at 8:12 AM, Fayland=
Lam <fayland@gma=
il.com
>
wrote:
border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; paddi=
ng-left: 1ex;">
This?

crypted.pm" target=3D"_blank">http://search.cpan.org/~jkrasnoo/ApacheCoo kie=
Encrypted-0.03/Encrypted.pm




Catalyst has a plugin:

n-0.35/lib/Catalyst/Plugin/CookiedSession.pm" target=3D"_blank">http://sear=
ch.cpan.org/~lbrocard/Catalyst-Plugin-CookiedSession-0.35/li b/Catalyst/Plug=
in/CookiedSession.pm




This module seems to want libapreq.1-34, which I inte=
rpret as not being compatible with mod_perl 2?

I tried installing it=
with CPAN on Ubuntu Jaunty and failed.

=A0 CPAN.pm: Going to build =
I/IS/ISAAC/libapreq-1.34.tar.gz


Please install mod_perl: 1.25 < version < 1.99
(Can't loca=
te in @INC (@INC contains: /=
root/misc/life/modules /root/lisleelec=
tric.com
/etc/perl /usr/local/lib/perl/5.10.0 /usr/local/share/perl/5.1=
0.0 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10=
/usr/local/lib/site_perl .) at Makefile.PL line 7.

) at Makefile.PL line 8.
BEGIN failed--compilation aborted at Makefile.P=
L line 18.
Warning: No success on command[/usr/bin/perl Makefile.PL INST=
ALLDIRS=3Dsite]
Warning (usually harmless): 'YAML' not installed=
, will not store persistent state

=A0 ISAAC/libapreq-1.34.tar.gz
=A0 /usr/bin/perl Makefile.PL INSTALLDIRS=
=3Dsite -- NOT OK
Running make test

Igor
=A0
ote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, 204, 204=
); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Thanks.



On Fri, Sep 18, 2009 at 9:06 PM, Igor Chudov < gmail.com">ichudov@gmail.com> wrote:

> Michael, you inspired me to reimplement cookies this way. For my site,=
the

> cookie table is the most frequently updated one (even though I do not =
grant

> cookies to search engines). I will try to use this kind of implementat=
ion.

>

> Even now, my users like the fact that they can stay signed=A0 on forev=
er, but

> now I can do it at no cost to myself.

>

> A quick question, is there an existing perl module to do this sort of =
thing?

>

> Igor

>

> On Wed, Sep 16, 2009 at 12:11 PM, Michael Peters < :mpeters@plusthree.com">mpeters@plusthree.com>

> wrote:

>>

>> On 09/16/2009 12:13 PM, Brad Van Sickle wrote:

>>

>>> Can I get you to explain this a little more? I don't see h=
ow this could

>>> be used for truly secure sites because I don't quite under=
stand how

>>> storing a hash in a plain text cookie would be secure.

>>

>> If you need to store per-session data about a client that the clie=
nt

>> shouldn't be able to see, then you just encrypt that data, bas=
e-64 encode it

>> and then put it into a cookie.

>>

>> If you don't care if the user sees that information you just w=
ant to make

>> sure that they don't change it then add an extra secure hash o=
f that

>> information to the cookie itself and then verify it when you recei=
ve it.

>>

>> I like to use JSON for my cookie data because it's simple and =
fast, but

>> any serializer should work. Something like this:

>>

>> use JSON qw(to_json from_json);

>> use Digest::MD5 qw(md5_hex);

>> use MIME::Base64::URLSafe qw(urlsafe_b64encode urlsafe_b64decode);=


>>

>> # to generate the cookie

>> my %data =3D ( foo =3D> 1, bar =3D> 2, baz =3D> 'frob=
' );

>> $data{secure} =3D generate_data_hash(\%data);

>> my $cookie =3D urlsafe_b64encode(to_json(\%data));

>> print "Cookie: $cookie\n";

>>

>> # to process/validate the cookie

>> my $new_data =3D from_json(urlsafe_b64decode($cookie));

>> my $new_hash =3D delete $new_data->{secure};

>> if( $new_hash eq generate_data_hash($new_data) ) {

>> =A0 =A0print "Cookie is ok!\n";

>> } else {

>> =A0 =A0print "Cookie has been tampered with! Ignore.\n";=


>> }

>>

>> # very simple hash generation function

>> sub generate_data_hash {

>> =A0 =A0my $data =3D shift;

>> =A0 =A0my $secret =3D 'some configured secret';

>> =A0 =A0return md5_hex($secret . join('|', map { "$_ -=
$data->{$_}" } keys

>> %$data));

>> }

>>

>> Doing encryption and encoding on small bits of data (like cookies)=
in

>> memory will almost always be faster than having to hit the databas=
e

>> (especially if it's on another machine). But the biggest reaso=
n is that it

>> takes the load off the DB and puts it on the web machines which ar=
e much

>> easier to scale linearly.

>>

>> > I know a lot of true app servers (Websphere, etc..) store

>>>

>>> this data in cached memory,

>>

>> You could do the same with your session data, or even store it in =
a shared

>> resource like a BDB file. But unless it's available to all of =
your web

>> servers you're stuck with "sticky" sessions and that=
's a real killer for

>> performance/scalability.

>>

>> --

>> Michael Peters

>> Plus Three, LP

>

>







--

Fayland Lam // http:/=
/www.fayland.org/





--00032555b392382c4a0473da04b7--

Re: Ways to scale a mod_perl site

am 18.09.2009 16:09:47 von David Avery

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

unsubscribe

Jeff Peng wrote:
>
> -----Original Message-----
>> From: Brad Van Sickle
>> Sent: Sep 17, 2009 12:13 AM
>> To: Michael Peters
>> Cc: Mod_Perl
>> Subject: Re: Ways to scale a mod_perl site
>
>> but I was unaware that there might be
>> an option for doing this without using a DB with mod_perl .
>
> As Tina said, how about using memcached for this case?
>
> Regards,
> Jeff Peng
>
>



- --
David Avery
Front Gate Solutions
1711 South Congress Austin, TX 78704
Ph: 512-674-9364
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkqzlKsACgkQ7SsBcHOnG7JsUwCfVTesb2CKmK2QtgBa5ZU9 waTW
XIQAoK0kbL1rlBBnXQ6rHl3bOHWf04yI
=k0ZL
-----END PGP SIGNATURE-----

Re: Ways to scale a mod_perl site

am 18.09.2009 17:13:17 von Tina Mueller

On Wed, 16 Sep 2009, Michael Peters wrote:

> On 09/16/2009 12:13 PM, Brad Van Sickle wrote:
>
>> Can I get you to explain this a little more? I don't see how this could
>> be used for truly secure sites because I don't quite understand how
>> storing a hash in a plain text cookie would be secure.
>
> If you need to store per-session data about a client that the client
> shouldn't be able to see, then you just encrypt that data, base-64 encode it
> and then put it into a cookie.

How does the user invalidate that "session"? (in case the cookie leaked
or something like that). Or how can the website owner log out a certain
user?
If I have a session cookie with data in the server database I can always
invalidate that session by login out and thus removing the database
entry.
I personally prefer to have control over such things...

Is one select per request that bad? if the website is completely
dynamic you will probably have other requests as well?

If you care about the number of selects you should IMHO better safe those
with the help of caching.

Re: Ways to scale a mod_perl site

am 18.09.2009 17:48:04 von mpeters

On 09/18/2009 11:13 AM, Tina Mueller wrote:

> How does the user invalidate that "session"? (in case the cookie leaked
> or something like that). Or how can the website owner log out a certain
> user?

When you generate the hash for the cookie, you can also include the
timestamp and the IP address of the client. If the cookie leaks it can't
be used (unless the person who steals it is also on the same NAT'd
network and uses it quickly). But you'll have that same problem anyway.

> Is one select per request that bad? if the website is completely
> dynamic you will probably have other requests as well?

One extra select on every request can add up. In most web architectures
the DB is a scarce shared resource.

> If you care about the number of selects you should IMHO better safe those
> with the help of caching.

Caching of sessions could help, but if you don't need to go down that
road, why do it in the first place?

--
Michael Peters
Plus Three, LP

Re: Ways to scale a mod_perl site

am 18.09.2009 17:52:57 von Scott Gifford

Brad Van Sickle writes:

>>
>>> 3) Being enabled by item 2, add more webservers and balancers
>>> 4) Create a separate database for cookie data (Apache::Session objects)
>>> ??? -- not sure if good idea --
>>
>> I've never seen the need to do that. In fact, I would suggest you
>> drop sessions altogether if you can. If you need any per-session
>> information then put it in a cookie. If you need this information to
>> be tamper-proof then you can create a hash of the cookie's data that
>> you store as part of the cookie. If you can reduce the # of times
>> that each request needs to actually hit the database you'll have big
>> wins.
>>
>>
>
> Can I get you to explain this a little more? I don't see how this
> could be used for truly secure sites because I don't quite understand
> how storing a hash in a plain text cookie would be secure.

The general idea is that you store a cryptographic hash of the cookie
information plus a secret only your app knows. Using | to show string
contatenation, your cookie would be:

YourCookieFields|HASH(YourCookieFields|YourSecret)

An attacker can't create the right hash because they don't know your
secret, and they can't change any fields in the cookie because the
hash would become invalid.

-----Scott.

Re: Ways to scale a mod_perl site

am 18.09.2009 17:56:29 von Scott Gifford

Tina Mueller writes:

> On Wed, 16 Sep 2009, Michael Peters wrote:
>
[...]
>> If you need to store per-session data about a client that the client
>> shouldn't be able to see, then you just encrypt that data, base-64
>> encode it and then put it into a cookie.
>
> How does the user invalidate that "session"? (in case the cookie leaked
> or something like that). Or how can the website owner log out a certain
> user?

Right, that is the trade-off for improved performance and scalability.
Different trade-offs will make sense for different sites. For most
sites, the performance and scalability won't matter too much, but for
some it will.

Simple things like timestamping the cookie and expiring it after
awhile can help some, but they will not get you the flexibility of
keeping everything in a database.

-----Scott.

RE: Ways to scale a mod_perl site

am 18.09.2009 18:16:33 von dihnen

It amounts to shared private key security.

Each web server, for instance, is configured with the key abcd1234

The session looks like=20

{ username =3D> 'dog'
, group =3D> 'canid'
, premium =3D> 0
, login_time =3D> 1253289574
}

I serialize that into a string with join '|', (map { $_, $session->{$_} } s=
ort keys %session;

$cookiebase =3D login_time|1253289574|group|canid|premium|0|username|dog

I apply md5_hex from the Digest::MD5 module

$signature =3D md5_hex($cookiebase . "|" . 'abcd1234');

Which yields

68b07c585c18282ea418937266b031d7=20

I then construct my cookie.

$cookie =3D join ':', %session, $signature;

So the cookie string looks like

premium:0:time:1253289574:username:dog:group:canid:68b07c585 c18282ea4189372=
66b031d7



When I receive the cookie on a request I just do the inverse.

my (%cookie, $signature) =3D split /:/, $cookie;

die 'BOGUS SESSION' unless ($signature eq md5_hex(join '|', (map { $_, $ses=
sion->{$_} } sort keys %cookie), 'abcd1234';

If you change the 'plaintext' string in any way - the md5_hex will change. =
If you change or drop the signature, the md5_hex will change.=20

Its security through obscurity admittedly - security in that you can't see =
my code, methodology, or shared secret configuration.

But most people consider that plenty secure for securing the session data.

David


-----Original Message-----
From: Brad Van Sickle [mailto:bvansickle3@gmail.com]=20
Sent: Wednesday, September 16, 2009 9:13 AM
To: Michael Peters
Cc: Mod_Perl
Subject: Re: Ways to scale a mod_perl site


>
>> 3) Being enabled by item 2, add more webservers and balancers
>> 4) Create a separate database for cookie data (Apache::Session objects)
>> ??? -- not sure if good idea --
>
> I've never seen the need to do that. In fact, I would suggest you drop=20
> sessions altogether if you can. If you need any per-session=20
> information then put it in a cookie. If you need this information to=20
> be tamper-proof then you can create a hash of the cookie's data that=20
> you store as part of the cookie. If you can reduce the # of times that=20
> each request needs to actually hit the database you'll have big wins.
>
>

Can I get you to explain this a little more? I don't see how this could=20
be used for truly secure sites because I don't quite understand how=20
storing a hash in a plain text cookie would be secure.=20

The thing I hate most about my "secure" applications is the fact that I=20
have to read the DB at the start of every request to ensure that the=20
session cookie is valid and to extract information about the user from=20
the session table using the session ID stored in the cookie. Hitting=20
the DB is the quickest way to kill performance and scalability in my=20
experience. I know a lot of true app servers (Websphere, etc..) =20
store this data in cached memory, but I was unaware that there might be=20
an option for doing this without using a DB with mod_perl .

Re: Ways to scale a mod_perl site

am 18.09.2009 18:42:51 von mpeters

On 09/18/2009 12:16 PM, Ihnen, David wrote:

> Its security through obscurity admittedly - security in that you can't see my code, methodology, or shared secret configuration.

No it's not really through obscurity. Even if someone found out your
method of serialization your data is still safe. It's only if they find
out your secret key that you'll have problems. But that's the same for
SSL, PGP and any other crypto.

--
Michael Peters
Plus Three, LP

Re: Ways to scale a mod_perl site

am 18.09.2009 19:00:57 von Igor Chudov

--0015175cdc5e6fcc910473dd13cd
Content-Type: text/plain; charset=ISO-8859-1

On Fri, Sep 18, 2009 at 10:13 AM, Tina Mueller wrote:

> On Wed, 16 Sep 2009, Michael Peters wrote:
>
> On 09/16/2009 12:13 PM, Brad Van Sickle wrote:
>>
>> Can I get you to explain this a little more? I don't see how this could
>>> be used for truly secure sites because I don't quite understand how
>>> storing a hash in a plain text cookie would be secure.
>>>
>>
>> If you need to store per-session data about a client that the client
>> shouldn't be able to see, then you just encrypt that data, base-64 encode it
>> and then put it into a cookie.
>>
>
> How does the user invalidate that "session"? (in case the cookie leaked
> or something like that). Or how can the website owner log out a certain
> user?
>

Same way you do with a table: when the user logs out, you update their
cookie to a new one, where "userid" is not set.



> If I have a session cookie with data in the server database I can always
> invalidate that session by login out and thus removing the database
> entry.
> I personally prefer to have control over such things...
>
> Is one select per request that bad? if the website is completely
> dynamic you will probably have other requests as well?
>
>
Well, the cookie table is the one that gets hit a lot and grows out of
control. It is hard to scale and replicate. Storing cookies on the browsers
solves this completely. I can have a billion browsers connect to my site and
no database growth will occur from that.



> If you care about the number of selects you should IMHO better safe those
> with the help of caching.
>

--0015175cdc5e6fcc910473dd13cd
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable



On Fri, Sep 18, 2009 at 10:13 AM, Tina M=
ueller <apache=
@s05.tinita.de
>
wrote:
tyle=3D"border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8e=
x; padding-left: 1ex;">
On Wed, 16 Sep 2009, Michael Peters wrote:



204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
On 09/16/2009 12:13 PM, Brad Van Sickle wrote:



204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Can I get you to explain this a little more? I don't see how this could=


be used for truly secure sites because I don't quite understand how

storing a hash in a plain text cookie would be secure.




If you need to store per-session data about a client that the client should=
n't be able to see, then you just encrypt that data, base-64 encode it =
and then put it into a cookie.




How does the user invalidate that "session"? (in case the cookie =
leaked

or something like that). Or how can the website owner log out a certain

user?

Same way you do with a table: when the user =
logs out, you update their cookie to a new one, where "userid" is=
not set.

=A0
rder-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding=
-left: 1ex;">

If I have a session cookie with data in the server database I can always >
invalidate that session by login out and thus removing the database

entry.

I personally prefer to have control over such things...



Is one select per request that bad? if the website is completely

dynamic you will probably have other requests as well?



Well, the cookie table is the one that gets hit a=
lot and grows out of control. It is hard to scale and replicate. Storing c=
ookies on the browsers solves this completely. I can have a billion browser=
s connect to my site and no database growth will occur from that.


=A0
lid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
If you care about the number of selects you should IMHO better safe those r>
with the help of caching.




--0015175cdc5e6fcc910473dd13cd--

Re: Ways to scale a mod_perl site

am 18.09.2009 21:15:41 von Igor Chudov

--0015175cffac3d9c6a0473def541
Content-Type: text/plain; charset=ISO-8859-1

On Fri, Sep 18, 2009 at 12:11 PM, James Smith wrote:

> Igor Chudov wrote:
>
>
>
> On Fri, Sep 18, 2009 at 10:13 AM, Tina Mueller wrote:
>
>> On Wed, 16 Sep 2009, Michael Peters wrote:
>>
>> On 09/16/2009 12:13 PM, Brad Van Sickle wrote:
>>>
>>> Can I get you to explain this a little more? I don't see how this could
>>>> be used for truly secure sites because I don't quite understand how
>>>> storing a hash in a plain text cookie would be secure.
>>>>
>>>
>>> If you need to store per-session data about a client that the client
>>> shouldn't be able to see, then you just encrypt that data, base-64 encode it
>>> and then put it into a cookie.
>>>
>>
>> How does the user invalidate that "session"? (in case the cookie leaked
>> or something like that). Or how can the website owner log out a certain
>> user?
>>
>
> Same way you do with a table: when the user logs out, you update their
> cookie to a new one, where "userid" is not set.
>
>
>
> You missed the point in the previous email - that is when the "System" logs
> the user out... User X does something naughty - you need to ban him from
> doing anything else so you
> make his cookie invalid - this information can only occur on the server
> side so you delete the reference in the database refering to this session..
> You are now having to check this
> each time which is the same as getting the information out of the
> database...
>
> I also think that everybody putting lots of stuff in their cookies is not
> thinking about network latency bandwidth etc - remember unless you are very
> careful how you specify your
> cookies you end up sending them on every request - for images/js/css/etc
> this adds to both bandwidth and CPU power - so it's a case of swings and
> roundabouts... balance your
> considerations....
>
> You will usually find a fast right through cache is the best solution for
> most information on the backend... and being careful to only really create
> sessions when you have to!
>

Thanks. I think that I understand the issue a little better.

When I delete someone's account, I blow away everything, their users table
entry, and all their content.

So when they have a cookie with an invalid userid, they cannot do that much
-- but I gotta admit that I need to think this through a little better.

i

--0015175cffac3d9c6a0473def541
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable



On Fri, Sep 18, 2009 at 12:11 PM, James =
Smith <js5@sanger.=
ac.uk
>
wrote:
order-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; paddin=
g-left: 1ex;">



=20


Igor Chudov wrote:




On Fri, Sep 18, 2009 at 10:13 AM, Tina
Mueller < t=3D"_blank">apache@s05.tinita.de>
wrote:

, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
On Wed, 16 Sep 2009, Michael Peters wrote:



04, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
On 09/16/2009 12:13 PM, Brad Van Sickle wrote:



(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Can I get you to explain this a little more? I don't see how this could=


be used for truly secure sites because I don't quite understand how

storing a hash in a plain text cookie would be secure.




If you need to store per-session data about a client that the client
shouldn't be able to see, then you just encrypt that data, base-64
encode it and then put it into a cookie.





How does the user invalidate that "session"? (in case the cookie =
leaked

or something like that). Or how can the website owner log out a certain

user?




Same way you do with a table: when the user logs out, you update their
cookie to a new one, where "userid" is not set.



=A0



You missed the point in the previous email - that is when the "System&=
quot;
logs the user out... User X does something naughty - you need to ban
him from doing anything else so you

make his cookie invalid - this information can only occur on the server
side so you delete the reference in the database refering to this
session.. You are now having to check this

each time which is the same as getting the information out of the
database...



I also think that everybody putting lots of stuff in their cookies is
not thinking about network latency bandwidth etc - remember unless you
are very careful how you specify your

cookies you end up sending them on every request - for
images/js/css/etc this adds to both bandwidth and CPU power - so it's a
case of swings and roundabouts... balance your

considerations....



You will usually find a fast right through cache is the best solution
for most information on the backend... and being careful to only really
create sessions when you have to!




Thanks. I think that I understand the issue a little=
better.

When I delete someone's account, I blow away everythin=
g, their users table entry, and all their content.

So when they hav=
e a cookie with an invalid userid, they cannot do that much -- but I gotta =
admit that I need to think this through a little better.


i


--0015175cffac3d9c6a0473def541--

Re: Ways to scale a mod_perl site

am 19.09.2009 00:38:10 von Ask Bjoern Hansen

On Sep 16, 2009, at 9:13, Brad Van Sickle wrote:

>> I've never seen the need to do that. In fact, I would suggest you
>> drop sessions altogether if you can. If you need any per-session
>> information then put it in a cookie. If you need this information
>> to be tamper-proof then you can create a hash of the cookie's data
>> that you store as part of the cookie. If you can reduce the # of
>> times that each request needs to actually hit the database you'll
>> have big wins.
>
> Can I get you to explain this a little more? I don't see how this
> could be used for truly secure sites because I don't quite
> understand how storing a hash in a plain text cookie would be secure.


If you are just concerned about the cookie being changed; add a time
stamp and a hash to the cookie data.

There's an example on page 19 of http://develooper.com/talks/rww-mysql-2008.pdf
...

If you are concerned about the cookie being readable at all, you can
encrypt the whole thing.

Either way it's "tamper proof".


- ask

--
http://develooper.com/ - http://askask.com/

Re: Ways to scale a mod_perl site

am 19.09.2009 20:43:31 von apache

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, 18 Sep 2009, Igor Chudov wrote:

> On Fri, Sep 18, 2009 at 10:13 AM, Tina Mueller wrote:
>
> > How does the user invalidate that "session"? (in case the cookie leaked
> > or something like that). Or how can the website owner log out a certain
> > user?
> >
>
> Same way you do with a table: when the user logs out, you update their
> cookie to a new one, where "userid" is not set.

That doesn't invalidate the cookie.
It resets the cookie in the browser, but the string itself is still a valid
session and can be reused.
Since there is nothing stored about it server side the server just gets
the session string from the client and doesn't care (doesn't know) if
any browser "logged out".

And storing the IP in the session wouldn't work for users that get a
new IP very often. On the other hand, several users might have the
same IP in the view of the server.

> > Is one select per request that bad? if the website is completely
> > dynamic you will probably have other requests as well?
> >
> >
> Well, the cookie table is the one that gets hit a lot and grows out of
> control. It is hard to scale and replicate. Storing cookies on the browsers
> solves this completely. I can have a billion browsers connect to my site and
> no database growth will occur from that.

You said your site is completely dynamic. So you probably have other
database requests per page view. This is the point where I would start
to optimize. IMHO that will bring you performance very fast.
On many pages of my portal, in the ideal case there is only one select
to the session table, all other things are cached (of course this
counts only for overview data and forum threads that were recently viewed,
but these are the pages which have the most views).

A session table is quite small and your selects always use an indexed
column. You might even be able to seperate the sessions into several
tables/databases (splitted by the first character of the sid, for
example), which enables you to split to different servers
without the tradeoff of replication.

- --
http://darkdance.net/
http://perlpunks.de/
http://www.trashcave.de/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Made with pgp4pine 1.76

iEYEARECAAYFAkq1JlUACgkQ8ezKMar1ua1nZgCgrEcvGn8FmKfQ+0Bo0Sgs dBHt
+RgAn1F/+QTJew5RYtcaxMOj7Ac4a/Od
=jQVj
-----END PGP SIGNATURE-----

Re: Ways to scale a mod_perl site

am 20.09.2009 00:31:09 von Bill Moseley

--00c09f9231311f17e10473f5ce73
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Sat, Sep 19, 2009 at 11:43 AM, Tina Müller wrot=
e:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Fri, 18 Sep 2009, Igor Chudov wrote:
>
> On Fri, Sep 18, 2009 at 10:13 AM, Tina Mueller
>> wrote:
>>
>> > How does the user invalidate that "session"? (in case the cookie leake=
d
>> > or something like that). Or how can the website owner log out a certai=
n
>> > user?
>> >
>>
>> Same way you do with a table: when the user logs out, you update their
>> cookie to a new one, where "userid" is not set.
>>
>
> That doesn't invalidate the cookie.
> It resets the cookie in the browser, but the string itself is still a val=
id
> session and can be reused.
>

That's why you have an expires time in the cookie data. Each request you
check and extend. Then if you see one that's past the expires time you
require authentication again.

"Logged out" is a fuzzy concept. If it means the user must provide
credentials again then you flag logged out in the cookie and then it will
appear to the user that they are logged out. Sure, if they copy the cookie
some place, log out, then they can use the cookie again seemingly w/o
logging in. But it's just an appearance. Logging in just means you have
provided the credentials and given them a tempoary token (the cookie) that
says they don't need to re-authenticate every request. It's a free pass fo=
r
the time allowed (regardless of the log out).

If you have much more stict business needs around "logging out" or a way to
imeadiately disable a user then you need to track that elsewhere -- set a
flag in memcached or use the db.



> Since there is nothing stored about it server side the server just gets
> the session string from the client and doesn't care (doesn't know) if
> any browser "logged out".
>
> And storing the IP in the session wouldn't work for users that get a
> new IP very often. On the other hand, several users might have the
> same IP in the view of the server.


Right, IPs are not much good. I use them sometimes to force a captcha if
too many failed logins come from the same IP.


--=20
Bill Moseley
moseley@hank.org

--00c09f9231311f17e10473f5ce73
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable



On Sat, Sep 19, 2009 at 11:43 AM, Tina M=
üller <apach=
e@s05.tinita.de
>
wrote:
style=3D"border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8=
ex; padding-left: 1ex;">
-----BEGIN PGP SIGNED MESSAGE-----

Hash: SHA1



On Fri, 18 Sep 2009, Igor Chudov wrote:



(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
=3D"im">
On Fri, Sep 18, 2009 at 10:13 AM, Tina Mueller < @s05.tinita.de" target=3D"_blank">apache@s05.tinita.de> wrote:



> How does the user invalidate that "session"? (in case the co=
okie leaked

> or something like that). Or how can the website owner log out a certai=
n

> user?

>



Same way you do with a table: when the user logs out, you update their

cookie to a new one, where "userid" is not set.




That doesn't invalidate the cookie.

It resets the cookie in the browser, but the string itself is still a valid=


session and can be reused.

That's why you have=
an expires time in the cookie data.=A0 Each request you check and extend.=
=A0 Then if you see one that's past the expires time you require authen=
tication again.


"Logged out" is a fuzzy concept.=A0 If it means the user must=
provide credentials again then you flag logged out in the cookie and then =
it will appear to the user that they are logged out.=A0 Sure, if they copy =
the cookie some place, log out, then they can use the cookie again seemingl=
y w/o logging in.=A0 But it's just an appearance.    Logging in jus=
t means you have provided the credentials and given them a tempoary token (=
the cookie) that says they don't need to re-authenticate every request.=
=A0 It's a free pass for the time allowed (regardless of the log out).<=
br>

If you have much more stict business needs around "logging out&quo=
t; or a way to imeadiately disable a user then you need to track that elsew=
here -- set a flag in memcached or use the db.

=A0

204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Since there is nothing stored about it server side the server just gets

the session string from the client and doesn't care (doesn't know) =
if

any browser "logged out".



And storing the IP in the session wouldn't work for users that get a >
new IP very often. On the other hand, several users might have the

same IP in the view of the server.

Right, IPs are not =
much good.=A0 I use them sometimes to force a captcha if too many failed lo=
gins come from the same IP.
=A0

--
Bill Moseley
href=3D"mailto:moseley@hank.org">moseley@hank.org



--00c09f9231311f17e10473f5ce73--